Learning a Robust Word Sense Disambiguation Model using Hypernyms in Definition Sentences

نویسندگان

  • Kiyoaki Shirai
  • Tsunekazu Yagi
چکیده

This paper proposes a method to improve the robustness of a word sense disambiguation (WSD) system for Japanese. Two WSD classifiers are trained from a word sense-tagged corpus: one is a classifier obtained by supervised learning, the other is a classifier using hypernyms extracted from definition sentences in a dictionary. The former will be suitable for the disambiguation of high frequency words, while the latter is appropriate for low frequency words. A robust WSD system will be constructed by combining these two classifiers. In our experiments, the F-measure and applicability of our proposed method were 3.4% and 10% greater, respectively, compared with a single classifier obtained by supervised learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Sense Disambiguation Using WordNet Relations

In this paper, the “Weighted Overlapping” Disambiguation method is presented and evaluated. This method extends the Lesk’s approach to disambiguate a specific word appearing in a context (usually a sentence). Sense’s definitions of the specific word, “Synset” definitions, the “Hypernymy” relation, and definitions of the context features (words in the same sentence) are retrieved from the WordNe...

متن کامل

Word Sense Disambiguation Using Vectors of Co-occurrence Information

This paper reports on the word sense disambiguation of Korean noun by using co-occurrence information in context. For a given noun, its local contextual word distribution is not enough to express their semantic characteristics for noun sense disambiguation. This paper proposes a cluster-based sense as a base vector. Contextual noise is removed by a term weighting method, and hypernyms of remain...

متن کامل

Automatic classification of bengali sentences based on sense definitions present in bengali wordnet

Based on the sense definition of words available in the Bengali WordNet, an attempt is made to classify the Bengali sentences automatically into different groups in accordance with their underlying senses. The input sentences are collected from 50 different categories of the Bengali text corpus developed in the TDIL project of the Govt. of India, while information about the different senses of ...

متن کامل

Automatic Idiom Identification in Wiktionary

Online resources, such as Wiktionary, provide an accurate but incomplete source of idiomatic phrases. In this paper, we study the problem of automatically identifying idiomatic dictionary entries with such resources. We train an idiom classifier on a newly gathered corpus of over 60,000 Wiktionary multi-word definitions, incorporating features that model whether phrase meanings are constructed ...

متن کامل

Analogical Word Sense Disambiguation

Word sense disambiguation is an important problem in learning by reading. This paper introduces analogical word-sense disambiguation, which uses human-like analogical processing over structured, relational representations to perform word sense disambiguation. Cases are automatically constructed using representations produced via natural language analysis of sentences, and include both conceptua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004